NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench

Liu, Zheyuan; Dou, Guangyao; Jia, Mengzhao; Tan, Zhaoxuan; Zeng, Qingkai; Yuan, Yongle; Jiang, Meng (April 2025, Association for Computational Linguistics)

Generative models such as Large Language Models (LLM) and Multimodal Large Language models (MLLMs) trained on massive web corpora can memorize and disclose individuals’ confidential and private data, raising legal and ethical concerns. While many previous works have addressed this issue in LLM via machine unlearning, it remains largely unexplored for MLLMs. To tackle this challenge, we introduce Multimodal Large Language Model Unlearning Benchmark (MLLMU-Bench), a novel benchmark aimed at advancing the understanding of multimodal machine unlearning. MLLMU-Bench consists of 500 fictitious profiles and 153 profiles for public celebrities, each profile feature over 14 customized question-answer pairs, evaluated from both multimodal (image+text) and unimodal (text) perspectives. The benchmark is divided into four sets to assess unlearning algorithms in terms of efficacy, generalizability, and model utility. Finally, we provide baseline results using existing generative model unlearning algorithms. Surprisingly, our experiments show that unimodal unlearning algorithms excel in generation tasks, while multimodal unlearning approaches perform better in classification with multimodal inputs.
more » « less
Full Text Available
BULKHEAD: Secure, Scalable, and Efficient Kernel Compartmentalization with PKS

https://doi.org/10.14722/ndss.2025.230328

Guo, Yinggang; Wang, Zicheng; Bai, Weiheng; Zeng, Qingkai; Lu, Kangjie (February 2025, Internet Society)

Full Text Available
CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts

https://doi.org/10.18653/v1/2025.findings-acl.214

Zeng, Qingkai; Bai, Yuyang; Tan, Zhaoxuan; Wu, Zhenyu; Feng, Shangbin; Jiang, Meng (January 2025, Association for Computational Linguistics)

Full Text Available
Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples

https://doi.org/10.1145/3627673.3679608

Zeng, Qingkai; Bai, Yuyang; Tan, Zhaoxuan; Feng, Shangbin; Liang, Zhenwen; Zhang, Zhihan; Jiang, Meng (October 2024, ACM)

Full Text Available
CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts

Zeng, Qingkai Zeng; Bai, Yuyang; Tan, Zhaoxuan; Wu, Zhenyu; Feng, Shangbin; Jiang, Meng (August 2024, arxiv)

Full Text Available
Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning

https://doi.org/10.18653/v1/2024.emnlp-main.372

Tan, Zhaoxuan; Zeng, Qingkai; Tian, Yijun; Liu, Zheyuan; Yin, Bing; Jiang, Meng (January 2024, Association for Computational Linguistics)

Full Text Available
Completing Taxonomies with Relation-Aware Mutual Attentions

Zeng, Qingkai; Zhang, Zhihan; Lin, Jinfeng; Jiang, Meng (July 2023, KDD)

Taxonomies serve many applications with a structural representation of knowledge. To incorporate emerging concepts into existing taxonomies, the task of taxonomy completion aims to find suitable positions for emerging query concepts. Previous work captured homogeneous token-level interactions inside a concatenation of the query concept term and definition using pre-trained language mod- els. However, they ignored the token-level interactions between the term and definition of the query concepts and their related concepts. In this work, we propose to capture heterogeneous token-level interactions between the different textual components of concepts that have different types of relations. We design a relation-aware mutual attention module (RAMA) to learn such interactions for taxonomy completion. Experimental results demonstrate that our new taxonomy completion framework based on RAMA achieves the state-of-the-art performance on six taxonomy datasets.
more » « less
Full Text Available
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

https://doi.org/10.18653/v1/2023.findings-emnlp.659

Zhang, Zhihan; Wang, Shuohang; Yu, Wenhao; Xu, Yichong; Iter, Dan; Zeng, Qingkai; Liu, Yang; Zhu, Chenguang; Jiang, Meng (January 2023, EMNLP)

Full Text Available
Modeling Complementarity in Behavior Data with Multi-Type Itemset Embedding

https://doi.org/10.1145/3458724

Wang, Daheng; Zeng, Qingkai; Chawla, Nitesh V.; Jiang, Meng (June 2021, ACM Transactions on Intelligent Systems and Technology)
null (Ed.)
People are looking for complementary contexts, such as team members of complementary skills for project team building and/or reading materials of complementary knowledge for effective student learning, to make their behaviors more likely to be successful. Complementarity has been revealed by behavioral sciences as one of the most important factors in decision making. Existing computational models that learn low-dimensional context representations from behavior data have poor scalability and recent network embedding methods only focus on preserving the similarity between the contexts. In this work, we formulate a behavior entry as a set of context items and propose a novel representation learning method, Multi-type Itemset Embedding , to learn the context representations preserving the itemset structures. We propose a measurement of complementarity between context items in the embedding space. Experiments demonstrate both effectiveness and efficiency of the proposed method over the state-of-the-art methods on behavior prediction and context recommendation. We discover that the complementary contexts and similar contexts are significantly different in human behaviors.
more » « less
Full Text Available
Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models

https://doi.org/10.1109/ICSE43902.2021.00040

Lin, Jinfeng; Liu, Yalin; Zeng, Qingkai; Jiang, Meng; Cleland-Huang, Jane (May 2021, Proceedings of the 43rd International Conference on Sofware Engineering)
null (Ed.)
Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue descriptions, to source code; however, their effectiveness has been restricted by availability of labeled data and efficiency at runtime. In this study, we propose a novel framework called Trace BERT (T-BERT) to generate trace links between source code and natural language artifacts. To address data sparsity, we leverage a three-step training strategy to enable trace models to transfer knowledge from a closely related Software Engineering challenge, which has a rich dataset, to produce trace links with much higher accuracy than has previously been achieved. We then apply the T-BERT framework to recover links between issues and commits in Open Source Projects. We comparatively evaluated accuracy and efficiency of three BERT architectures. Results show that a Single-BERT architecture generated the most accurate links, while a Siamese-BERT architecture produced comparable results with significantly less execution time. Furthermore, by learning and transferring knowledge, all three models in the framework outperform classical IR trace models. On the three evaluated real-word OSS projects, the best T-BERT stably outperformed the VSM model with average improvements of 60.31% measured using Mean Average Precision (MAP). RNN severely underperformed on these projects due to insufficient training data, while T-BERT overcame this problem by using pretrained language models and transfer learning.
more » « less
Full Text Available

« Prev Next »

Search for: All records